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ABSTRACT 


In  this  report,  algorithms  for  spectral  unmixing  are  organized  into  taxonomies 
and  their  performance  is  then  compared.  Onr  motivation  is  to  collectively  organize 
and  relate  algorithms  in  order  to  assess  the  current  state-of-the-art  in  the  field  and 
to  facilitate  objective  comparisons  between  methods.  The  hyperspectral  sensing 
community  is  populated  by  investigators  with  disparate  scientific  backgrounds  and 
efforts  in  spectral  unmixing  developed  within  disparate  communities  have  inevitably 
led  to  duphcation.  This  report  is  intended  to  remove  ambiguity  and  redundancy 
by  using  a  standard  vocabulary,  and  clearly  summarize  what  has  and  has  not  been 
done.  As  will  be  evident,  the  framework  for  the  taxonomies  derives  its  organization 
firom  the  fundamental,  phnosophical  assumptions  imposed  on  the  problem,  rather 
than  the  common  calculations  they  perform,  or  the  similar  outputs  they  might  yield. 
The  taxonomies  are  supplemented  by  a  comparison  of  unmixing  performance  using 
techniques  that  typify  the  approaches  of  wide  classes  of  algorithms. 
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1.  INTRODUCTION 


Hyperspectral  immixing  is  the  procedure  by  which  pixel  spectra  in  a  scene  are  decomposed 
into  constituent  substances  and  the  fractions  in  which  they  appear.  The  collection  of  constituent 
members  is  comprised  of  materials  found  in  the  scene  that  are  deemed  pure  substances.  In  the 
strict  sense,  constituent  members  (or  endmembers)  can  be  unique  elements,  (e.g.,  iron,  copper), 
but  in  the  practical  sense  of  hyperspectral  imaging,  the  endmembers  more  likely  represent  disparate 
maoroscopic  entities  (e.g.,  sand,  trees,  vehicle  paint,  asphalt,  concrete). 

Each  pixel  in  a  hyperspectral  image  consists  of  observations  of  the  surface  radiance  in  adjacent 
electromagnetic  channels.  After  cahbration  and  atmospheric  compensation,  the  observations  con¬ 
vey  the  total  spectral  response  emanating  from  materials  within  eaoh  pixel.  Unmixing  algorithms 
attempt  to  extrapolate  the  components  that  comprise  each  mixed  pixel,  as  well  as  the  relative 
proportions  in  which  they  appear.  The  approaches  vary  in  their  methods,  and  through  their  al¬ 
gorithmic  formulations,  implicitly  incorporate  philosophical  assumptions  regarding  the  physical 
mechanisms  and  mathematical  structure  by  which  the  reflectance  properties  from  disparate  sub¬ 
stances  combine  to  yield  mixed  pixel  spectra.  Hence,  when  a  mixed  pixel  is  processed  by  different 
unmixing  algorithms,  the  disparity  of  assumptions  is  the  cause  of  the  range  of  output  values. 

Understanding  the  philosophical  differences  between  algorithms  is  important  when  comparing 
the  performance  of  different  algorithms.  Thus,  an  algorithm  taxonomy  provides  the  first  step  toward 
benchmarking  algorithm  performance  and  establishing  the  conditions  imder  which  some  algorithms 
fail  or  succeed. 


1.1  MOTIVATION 

The  motivation  for  writing  this  report  is  to  provide  a  clear  technical  imderstanding  of  the 
universe  of  spectral  unmixmg  algorithms  and  to  provide  a  comparison  of  the  performance  of  algo¬ 
rithms  with  actual  hyperspectral  data.  Hyperspectral  sensing  and  processing,  unlike  most  other 
sensing  paradigms,  resides  at  the  confluence  of  several  disciplines,  including,  but  not  limited  to, 
physics,  mathematics,  engineering,  and  statistics.  It  is  not  coincidental  that  algorithms  for  unmix¬ 
ing  h3q)erspectral  data  emphasize  the  strengths  and  biases  of  the  developers.  But,  the  lack  of  a 
common  vocabulary  has  resulted  in  many  duplicate  efforts.  In  this  sense,  the  taxonomies  strive  to 
distill  the  universe  of  known  algorithms  to  a  minimal  set. 

The  taxonomies  hierarchically  organize  the  algorithms  into  general  categories  based  on  their 
operating  assumptions.  Moreover,  because  the  taxonomies  reveal  that  large  munbers  of  algorithms 
within  the  same  class  are  often  derived  from  a  single  imderlying  formulation,  the  comparison  of 
their  performance  can  be  Hmited  to  a  single  representative,  or  ”best-of-class”,  algorithm.  Moreover, 
we  hope  the  taxonomies  provide  a  framework  for  future  algorithm  development,  clearly  indicating 
what  has  and  has  not  been  done.  As  will  become  evident  in  this  document,  algorithms  have  been 
organized  at  the  highest  levels  by  the  general  assumptions  they  implicitly  impose  on  the  problem. 
With  this  in  mind,  new  algorithms  should  fit  into  this  structure. 
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1.2  DOCUMENT  OUTLINE 


The  principle  contributions  of  this  report  are  three  taxonomies  (Figures  3,  4,  and  5)  for 
spectral  umnixing  algorithms  and  comparisons  of  performance  for  several  algorithms.  Section  2 
describes  two  fundamental  models  for  spectral  mixing,  linear  mixing  and  non-linear  mixing.  Sec¬ 
tion  3  introduces  the  taxonomies  for  three  procedures  crucial  to  linear  immixing,  dimensionality 
reduction,  endmember  determination,  and  inversion,  and  provides  a  technical  discussion  of  algo¬ 
rithms.  Section  4  discusses  non-linear  unmixing  algorithms.  In  Section  5,  the  results  of  performance 
comparisons  between  different  algorithms  are  presented.  Finally,  Section  6  summarizes  the  findings. 
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2.  PHYSICAL  MODELS  FOR  SPECTRAL  MIXING 


Any  approach  for  effectively  munixing  hyperspectral  data  must  begin  with  a  model  describ¬ 
ing  how  constituent  substances  in  a  scene  combine  to  form  the  composite  spectrum  appearing  in  a 
received  pixel.  Mixing  models  attempt  to  represent  the  underlying  physics  that  are  the  foimdation 
of  hyperspectral  phenomenology;  unmixing  algorithms  use  these  models  to  perform  the  inverse  op¬ 
eration,  attempting  to  recover  the  constituent  spectra  and  their  associated  fractional  abimdances 
from  the  mixed  pixel  spectrum.  We  present  two  mixing  models  that  each  offer  a  different  inter¬ 
pretation  of  how  incident  radiation  interacts  with  an  incident  surface.  Discussions  focusing  on  the 
mixing  models  from  both  an  engineering  [21]  and  physics  [8, 13, 14]  perspective  are  available  in  the 
literature. 


2.1  LINEAR  MIXING 

In  Figure  1(a),  the  reflecting  surface  is  portrayed  as  a  checkerboard  mixture,  and  the  incident 
radiation  only  bounces  once  on  its  surface.  If  the  total  surface  area  is  conceived  to  be  divided 
proportionally  according  to  the  fractional  abimdances  of  the  constituent  substances,  then  the  re¬ 
flected  radiation  will  convey  with  the  same  proportions  the  characteristics  of  the  associated  media. 
In  this  sense,  there  exists  a  linear  relationship  between  the  fractional  abundance  of  the  substances 
comprising  the  area  being  imaged,  and  the  spectra  in  the  reflected  radiation.  Hence,  this  is  called 
the  linear  mixing  model  (LMM),  and  is  expressed  as 

p 

X  =  ^  ttfcSfe  -f  w  =  Sa  +  w  (1) 

fc=i 

where  x  is  the  M  x  1  received  pixel  spectrum  vector,  Sk  is  the  fc-th  M  x  1  column  of  S,  a  is  the 
P  X  1  fractional  abundance  vector,  w  is  the  M  x  1  additive  observation  noise  vector,  S  is  the 
M  X  P  matrix  whose  columns  are  s/t,  M  is  the  number  of  spectral  bands,  and  P  is  the  number  of 
endmembers.  Frequently,  block  notation  will  be  employed  using  upper-case  variables  to  extend  (1) 
when  multiple  pixels  are  considered  so  that  X  —  S A  +  W.  Then  for  N  pixels,  X  is  a  M  x  AT  matrix 
whose  N  columns  are  the  M  x  1  pixel  spectra.  Likewise,  A  is  a  P  x  AT  matrix  of  abundances,  and 
W  is  a  M  X  AT  matrix  of  additive  noise  values. 


2.2  NON-LINEAR  MIXING 

Figure  1(b)  depicts  a  more  complicated  scenario.  The  arrangement  of  the  constituent  sub¬ 
stances  is  not  as  orderly  as  in  Figure  1(a)  because  the  substances  comprising  the  medium  are  not 
organized  proportionally  on  the  smrface.  This  intimate  mixture  results  when  each  component  is 
randomly  distributed  in  a  homogeneous  way.  As  a  result,  the  incident  radiation  can  experience  re¬ 
flections  with  multiple  substances,  and  the  aggregate  spectrum  of  reflected  radiation  may  no  longer 
uphold  the  linear  proportions  (either  in  mass  fraction  or  in  volume)  of  the  constituent  substance 
spectra.  Because  (1)  is  inappropriate  to  describe  this  interaction,  this  scenario  is  referred  to  as 
non-linear  mixing.  We  briefly  discuss  two  non-linear  models  for  spectral  mixing. 
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Figure  1.  Alternate  mixing  paradigms:  (a)  linear  mixing  from  a  checkerboard  mixture  having  a  single  reflection, 
(h)  non-linear  mixing  from  an  intimate  mixture  having  multiple  reflections. 


2.2.1  Two  Stream  Method 

Many  investigators  have  demonstrated  that  vegetation  and  soil  surfaces  are  not  well-described 
by  linear  mixing  models,  and  non-linear  models  are  required  to  describe  these  kinds  of  mixtures, 
which  are  treated  as  particulate  media.  One  approach  develops  a  model  for  scattering  from  par¬ 
ticulate  surfaces  which,  in  contrast  to  the  linear  mixing  model,  incorporates  multiple  scattering 
into  the  expression  for  the  bidirectional  reflectance  function  (BDRF)  for  a  surface  consisting  of 
particles  of  arbitrary  shape  in  close  proximity  to  one  another  [13].  The  collection  of  effects  induced 
by  multiple  scattering  in  a  homogeneous  medium  can  be  organized  under  a  single  parameter  known 
as  single-scattering  albedo,  w,  which  is  the  probability  that  a  photon  survives  interaetions  such 
as  Fresnel  reflection,  absorption,  scattering,  and  diffraction  due  to  the  presence  of  a  single  grain 
of  a  substance.  In  the  presence  of  multiple  substances,  the  average  single-scattering  albedo,  W,  of 
a  homogeneously  mixed,  multi-component  surface  can  be  approximated  at  each  wavelength  by  an 
expression  depending  on  the  fractional  abundances  in  the  mixture  as  [8] 


ELi 


w  ~ 


PkDk 


(2) 


where  pk  is  the  density  of  the  k-th  substance,  Dk  is  the  k-th  mean  grain  diameter,  and  is  the 
single-scattering  albedo  for  the  /c-th  substance. 


2.2.2  Radiosity 

An  altogether  different  approach  to  modeling  non-linear  mixing  employs  a  more  macroscopic 
viewpoint  than  the  one  for  particulate  media  in  2.2.1  [4].  In  contrast  to  the  two-stream  model 
in  Section  2.2.1  which  requires  scattering  parameters  of  particles  to  ascertain  the  single-scattering 
albedo  and  the  fractional  abundances  of  particulate  mixtures,  the  method  of  radiosity  balances 
equations  representing  the  transmitted  and  reflected  radiation  between  specific  geometries  of  soil 
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and  vegetation;  the  equations  can  be  solved  for  the  proportion  of  a  specific  substance  in  the  model 
that  reconciles  theory  with  the  collected  data.  The  solution  to  the  radiosity  equation  has  been 
formulated  for  simple  canopy  geometries  such  as  single-layers  of  vegetation  above  soil  where  multiple 
reflections  and  transmissions  occur  between  leaf  layer  and  soil,  layered  canopies,  and  rough  soil  [4] . 
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3.  ALGORITHM  TAXONOMIES:  LINEAR  UNMIXING 


We  return  to  the  linear  mixing  model  in  (1).  In  its  most  complete  form,  linear  unmixing  is 
a  sequence  of  three  consecutive  steps  that  estimates  the  spectra  of  the  constituent  substances,  or 
endmembers,  and  the  associated  fractional  abundances  for  each  pixel  in  the  scene: 

Dimension  reduction  Reduce  the  dimension  of  the  data  in  the  scene.  This  step  is  optional  and 
is  only  invoked  by  some  algorithms  in  order  to  reduce  the  computational  load  of  subsequent 
steps. 

Endmember  determination  Estimate  the  distinct  spectra  that  constitute  the  mixed  pixels  in 
the  scene. 

Inversion  Estimate  the  fractional  abundances  of  each  mixed  pixel  from  its  spectrum  and  the 
endmember  spectra. 

Most  of  the  algorithms  discussed  in  this  document  address  only  one  of  these  steps.  In  some  cases, 
however,  an  algorithm  may  estimate  endmembers  and  abundances  concurrently,  effectively  per- 
forming  two  steps  simultaneously. 


3.1  TAXONOMY  STRUCTURE 

In  creating  the  taxonomies  in  Figures  3,  4,  and  5,  we  have  attempted  to  both  group  together 
as  well  as  differentiate  algorithms  by  their  characteristics.  From  extensive  compilations  of  their 
properties,  algorithms  were  then  hierarchically  categorized  by  three  features,  depicted  in  Figure  2, 
that  highlight  important  philosophical  interpretations  of  the  unmixing  problem.  Figures  3,  4,  and  5 
all  share  the  top-down  hierarchical  structure,  whose  features  and  categories  are  defined  as  follows: 

Interpretation  of  data  Indicates  how  an  algorithm  interprets  mixed  pixel  spectra. 

•  Non-statist ical  vs.  Statistical:  If  an  algorithm  employs  statistical  measures  (e.g., 
means, covariances)  to  quantify  the  aggregate  behavior  of  hyperspectral  data,  it  is  statis¬ 
tical.  Implicitly,  statistical  algorithms  process  mixed  pixels  by  comparing  their  features 
to  those  belonging  to  a  known  class.  This  distinction  becomes  important  in  applications 
like  detection  where  aggregate  measures  can  obscure  the  features  of  rare  targets. 

Description  of  randomness  Indicates  how  an  algorithm  incorporates  the  randomness  of  the 
data. 

•  Non-parametric  vs.  Parametric:  If  an  algorithm  presumes  the  random  behavior 
of  hyperspectral  data  is  described  by  a  specific,  parameterized  probabihty  distribution 
function,  it  is  parametric.  An  algorithm  that  is  statistical  is  not  always  parametric, 
although  the  converse  must  be  true.  For  example,  algorithms  that  empirically  estimate 
means  and  covariances  without  assuming  a  specific  model  for  the  underlying  density  are 
statistical,  but  non-parametric. 
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Figure  2.  Conceptual  structure  for  taxonomies. 

Optimization  criteria  Indicates  what  objective  function  is  being  optimized. 

•  Non-squared-error  and  squared-error:  If  an  algorithm  employs  squared-error  as  a 
distance  measure,  it  is  classified  as  using  squared-error. 

•  Maximum  Likelihood  (ML):  Parametric  algorithms  that  estimate  deterministic  quan¬ 
tities  by  maximizing  a  likelihood  function  derived  from  a  forward  density  are  considered 
ML. 

•  Maximum  a  Posteriori  (MAP):  Parametric  algorithms  that  estimate  random  quan¬ 
tities  by  maximizing  a  likelihood  function  derived  from  a  posterior  density  are  considered 
MAP. 

Further  discrimination  in  the  taxonomies  beneath  the  structure  in  Figure  2  is  accomplished  by 
examining  features  pertaining  to  I)  the  primary  output  of  the  algorithm,  2)  the  required  inputs, 
and  3)  and  the  characterization  of  noise.  More  detailed  information  for  each  taxonomy  appears  in 
its  respective  section. 

Finally,  the  taxonomies  enumerate  the  remaining  features  of  unmixing  algorithms  that  are 
not  incorporated  into  the  hierarchical,  taxonomy  structure.  They  have  been  coarsely  grouped  into 
categories  related  to  1)  a  priori  information  required  as  inputs  for  an  algorithm,  2)  output  features, 
3)  additional  algorithm  outputs  besides  the  intended  result,  and  4)  processing  and  architectural 
requirements. 

3.2  DIMENSION  REDUCTION  TAXONOMY 

In  this  section,  we  consider  techniques  to  reduce  the  dimension  of  hyperspectral  data.  The 
taxonomy  for  this  family  of  algorithms  appears  in  Figure  3.  In  addition  to  the  three  levels  of 
structure  imposed  by  the  features  in  Figure  2,  three  more  features  provide  further  discrimination 
for  dimension  reduction  algorithms:  1)  the  output  axes  for  compression,  2)  the  required  input  data, 
and  3)  the  kind  of  noise,  if  any,  in  the  signal  model. 
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Figure  3.  Taxonomy  of  dimension  reduction  algorithms. 
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The  simplest  form  of  dimension  reduction  discards  unwanted  bands  altogether.  This  approach 
is  useful  when  the  utility  of  specific  bands  is  well  known,  or,  conversely,  when  irrelevant  or  unreli¬ 
able  bands  are  identifiable.  Such  decisions  are  usually  justified  by  applications  demanding  strong 
physical  knowledge.  In  Heu  of  such  knowledge,  the  dimension  reduction  algorithms  discussed  here 
attempt  to  compress  hyperspectral  data  based  on  alternative  criteria. 

3.2.1  Non-statistical  and  Non-parametric  Dimension  Reduction 

Non-statistical  techniques  strive  to  compress  hyperspectral  data  by  identifying  the  useful 
components  of  pixel  spectra,  but  they  avoid  using  aggregate  measurements  of  the  data  (e.g.,  means, 
covariances)  as  the  basis  for  calculations.  Instead,  they  rely  on  ‘‘growing”  and  “pruning”  the  useful 
subspace  from  a  stream  of  consecutive  pixels.  While  optimality  may  be  lost  in  one  context,  these 
methods  are  often  better  suited  to  real-time  implementation  because  they  avoid  the  latency  intrinsic 
to  statistical  methods. 


3.2. 1.1  Squared-error  In  this  section  we  consider  one  algorithm  that  employs  a  measure  of 
squared-error  to  determine  the  axes  for  compression  of  pixel  spectra. 

ORASIS  A  non-statistical  technique  for  reducing  the  dimension  of  hyperspectral  data  ap¬ 
pears  in  the  Naval  Research  Laboratory’s  ORASIS  (Optical  Real-time  Adaptive  Spectral  Identi¬ 
fication  System),  which  is  a  series  of  hyperspectral  processing  modules  [5,6].  In  the  Exemplar 
Selector  Module  (ESM),  when  a  new  pixel  is  collected  from  the  scene  by  the  sensor,  its  spectra  is 
compared  to  the  existing  set  of  exemplars  (the  first  pixel  in  a  scene  automatically  becomes  the  first 
exemplar).  If  the  new  pixel  is  sufficiently  dffierent  -  the  usual  measure  of  distance  is  spectral  angle 
“  from  each  of  the  existing  exemplars  based  on  a  prescreen  threshold,  it  is  added  to  the  collection. 
If  it  is  not  sufficiently  different,  the  exemplar  set  remains  imchanged.  At  periodic  intervals,  the 
Salient  Selection  Module  (SSM)  then  orthogonalizes  the  current  set  of  exemplars  using  a  modified 
Gram-Schmidt  process  where  a  permitted  representation  threshold,  determines  what  subset  of  the 
orthogonal  vectors  are  retained  to  reduce  the  dimension  of  the  current  set  of  exemplars.  In  this 
reduced  dimension,  the  exemplars  are  submitted  to  another  module  to  geometrically  determine 
endmembers. 

The  performance  of  the  algorithm  is  linked  to  the  value  of  the  prescreen  angle  threshold  which 
is  used  to  admit  exemplars  and  to  the  representation  threshold  that  defines  the  dimension  of  the 
projected  space.  Smaller  values  of  the  prescreen  threshold  admit  more  exemplars,  and  generally,  add 
more  dimensions  to  the  exemplar  subspace.  This  parameter  may  be  timed  to  admit  different  levels 
of  variability  in  the  scene  into  the  exemplar  set.  The  representation  threshold,  however,  defines  the 
required  dimension  of  the  projected  space  to  preserve  a  minimum  fidelity  in  the  exemplars  and  is 
important  to  the  subsequent  module  for  determining  endmembers 

3.2.2  Statistical  and  Non-parametric  Dimension  Reduction 

In  this  section  we  present  statistical  techniques  for  reducing  the  dimension  of  hyperspectral 
data  that  possess  no  parametric  models.  The  axes  for  compression  in  aU  of  these  algorithms  are 
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revealed  by  performing  an  eigendecomposition  on  covariance  matrices.  Featmres  of  targets  that 
occur  with  a  relatively  low  probabihty  may  be  lost  using  statistical  compression  techniques,  and 
the  requisite  processing  may  be  imsuitable  for  real-time  implementations.  However,  the  optimal 
properties  associated  with  statistical  methods  make  them  desirable. 


3.2.2. 1  Squared-error  We  consider  a  technique  for  reducing  the  dimension  of  pixels  in  an 
image  that  directly  exploits  the  squared-error  approximation  properties  of  the  eigendecomposition 

[31]. 


Principal  Component  Analysis  Principal  component  analysis  (PCA)  performs  an  eigen¬ 
decomposition  on  the  covariance  of  the  data  and  determines  how  the  power  in  a  signal  is  allocated. 
Letting  x(n),  n  =  1, . . . ,  AT,  denote  the  spectra  of  N  mixed  pixels,  the  covariance,  is  calculated 
as 

Tx  =  E[{-x  -  /x:r)(x  -  ^  ^(x(n)  -  Mx)(x(n)  -  (3) 

n=l 

where  is  the  mean  vector  of  the  pixel  set.  The  resulting  eigendecomposition  can  be  expressed  as 
Tx  ==  USU^,  where  U  is  a  unitary  matrix  of  eigenvectors,  and  E  is  a  diagonal  matrix  of  eigenvalues. 
The  magnitude  of  the  eigenvalues  indicates  the  power  residing  in  the  data  along  the  component  of 
the  data  parallel  to  the  associated  eigenvector.  The  larger  eigenvalues  identify  basis  components 
whose  absence  in  x  is  directly  proportional  to  a  larger  average  squared-error.  Hence,  the  effective 
dimensionality  of  the  data  can  be  estimated  by  counting  the  number  of  non-zero  eigenvalues.  If 
the  data  is  transformed  by  pre-multiplication  with  U^,  then  the  resulting  linear  transformation 
moves  a  to  a  new  system  of  decorrelated  variables  oriented  along  the  eigenvectors  in  U.  The  new 
vector  of  variables  can  be  tnmcated  to  retain  only  those  having  significant  eigenvalues.  The  result 
is  a  lower-dimensional  multivariate  random  vector  that  conveys  most  of  the  power  in  the  original, 
higher-dimensional  system. 


3.2.2.2  Signal-to-Noise  Ratio  We  examine  an  alternative  technique  that  maximizes  signal- 
to-noise  ratio  rather  than  minimizing  squared-error,  but  still  employs  the  eigendecomposition  as 
its  principle  computational  tool. 


Maximum  Noise  Fraction/Noise  Adjusted  Principal  Components  One  technique 
for  compression  that  is  also  based  on  the  eigendecomposition  produces  an  ordered  decomposition 
of  a  signal  into  components  based  on  the  ratio  of  noise  power  to  signal  power  [12].  Provided 
the  second-order  statistics  of  the  noise  in  (1)  are  known,  let  the  noise  and  signal  and  noise  be 
xmcorrelated  and  known  by  SFaS*^  and  Fw,  respectively,  such  that  x  has  a  covariance  given  by 
F^  —  SFaS*^  +  F^y.  The  component  of  the  received  signal,  v,  possessing  maximum  noise  fraction 
maximizes 


v^r^v 


(4) 
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The  maximum  noise  fraction  (MNF)  transform  is  a  linear  transformation  that  identifies  and 
orders  signal  components  by  their  noise  fraction.  The  eigenvalues  of  are  the  noise  fractions 

associated  with  each  left-hand  eigenvector.  Consequently  the  left-hand  eigenvectors  of 
comprise  a  linear,  orthogonal  transform  that  orders  the  components  of  a  signal  by  their  noise 
fractions.  This  is  in  direct  contrast  to  PCA  where  no  noise  model  is  used,  and  the  components 
are  ordered  only  by  their  signal  power.  In  MNF,  the  largest  eigenvalues  identify  components  of 
the  signal  exhibiting  the  highest  noise  fractions,  while  smaller  eigenvalues  identify  the  opposite.  It 
is  important  to  note  that  the  noise  fraction  says  nothing  about  that  the  absolute  signal  or  noise 
levels.  This  information  is  obtained  from  the  PCA  of  the  individual  covariance  matrices. 

The  Noise  Adjusted  Principal  Components  (NAPC)  transform  [20]  orders  components  in 
a  signal  by  decreasing  signal-to-noise  ratio  (SNR).  Aside  from  the  ordering  of  the  components, 
the  transform  is  mathematically  equivalent  to  the  MNF  transform  and  achieves  the  exact  same 
result.  As  in  PCA,  the  ordering  of  components  can  be  used  to  estimate  one  type  of  effective  signal 
dimensionality,  and  the  set  of  random  variables  obtained  after  the  MNF  transform  can  be  trxmcated 
to  retain  only  those  that  have  a  useful  SNR. 


3.3  ENDMEMBER  DETERMINATION  TAXONOMY 

In  this  section,  we  examine  different  endmember  determination  algorithms.  The  taxonomy 
in  Figure  4  organizes  the  algorithms  along  the  lines  of  the  features  in  Figure  2.  To  add  further 
structure,  four  additional  features  have  been  included:  1)  the  type  of  endmembers  being  derived,  2) 
the  number  of  endmembers,  P,  estimated  by  an  algorithm,  with  respect  to  the  number  of  spectral 
bands,  M,  and  the  number  of  pixels  being  processed,  3)  the  required  input  data,  and  4)  the  the 
kind  of  noise,  if  any,  in  the  signal  model. 

3.3.1  Non-statistical  and  Non-parametric  Endmember  Determination 

In  this  section  we  consider  two  ways  to  geometrically  interpret  hyperspectral  data.  Both 
methods  derive  endmembers  from  a  scene  without  statistical  measures  or  parametric  models. 


3.3. 1.1  Geometric  Approaches  A  purely  geometric  approach  toward  endmember  determi¬ 
nation  has  been  explored  to  exploit  the  strong  parallelism  between  the  linear  mixing  model  and 
the  theory  of  convex  sets  [3,9].  Hyperspectral  data  from  a  scene  reside  within  a  volume  in  a 
multi-dimension  hyperspace;  endmembers  are  estimated  from  the  most  extreme-valued  pixels  in 
the  scene,  so  that  any  point  within  the  volume  is  a  non-negative  sum  of  those  endmembers. 

The  determination  of  endmembers  consists  of  two  steps.  The  first  requires  finding  a  minimum 
subset  of  pixels  in  the  scene  that  describe  the  range  of  variation  in  the  scene.  The  convex  hull  of 
the  pixels  serves  this  purpose,  and  is  always  unique  for  every  scene,  although  various  methods  exist 
for  its  calculation  [26],  The  second  step  involves  fitting  a  simplex  aroimd  the  convex  hull  that 
satisfies  a  criteria  for  ‘best  fit,’  and  unlike  the  convex  hull,  the  simplex  is  not  unique.  The  simplex 
is  a  geometric  surface  in  an  n-dimensional  hyperspace  defined  by  exactly  n  +  1  vertices.  A  simplex 
enclosing  the  convex  hull  of  pixels  in  a  scene  can  be  arbitrarily  large,  but  if  the  enclosed  volume  is 
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Figure  4-  Taxonomy  of  endmember  determination  algorithms. 
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minimized  by  adjusting  its  faces,  its  vertices  can  be  interpreted  as  the  locations  of  pure  spectra  in 
the  scene  and  may  be  used  as  endmembers  in  inversion. 

Two  different  algorithms  for  estimating  endmembers  geometrically  are  derived  from  an  itera¬ 
tive  technique  for  encapsulating  the  data  cloud  with  a  simplex  having  minimum  volume  [9].  They 
are  distinguished  by  a  priori  knowledge  of  a  dark  point  that  is  defined  as  the  response  of  the  sensor 
to  a  target  of  zero  refiectance. 


Minimum  Volume  Transform:  Dark-Point-Fixed  Transform  The  dark-point-fixed 
transform  (DPFT)  [9]  is  presented  for  the  scenario  where  the  the  dark  point  is  known  and  the 
co-ordinate  axes  can  be  adjusted  to  that  point;  put  differently,  the  origin  is  permitted  to  be  one 
vertex  of  the  simplex.  First,  the  h3q)erplane,  H  =  {xif  €  :  u^x^  =  1},  is  introduced,  where  u 

is  an  M  X  1  vector  of  all  ones  whose  length  equals  the  number  of  bands,  and  xh  is  the  normalized 
projection  of  a  pixel,  x,  lying  in  H,  The  projection  of  a  set  of  pixels  on  H  identifies  a  region  to  be 
enclosed  by  a  new  set  of  co-ordinate  vectors,  also  residing  on  J?,  that  encapsulate  the  entire  set  of 
points,  but  with  a  minimum  volume.  If  S  is  an  M  x  M  matrix  whose  columns,  Sfc,  also  reside  on 
H,  and  Q  =  S”^,  then  u^S  =  u^  and  u^Q  =  u^.  The  volume  of  the  simplex,  5,  formed  by  the 
columns  of  S  and  capped  by  H  is 

volume(5)  =  .  (5) 

If  x/f  represents  the  value  of  pixels  in  the  standard,  orthonormal  axes,  then,  in  the  co-ordinate 
system  of  the  non-orthogonal  columns  of  S,  the  equivalent  co-ordinate  values  are  the  mixing  coef¬ 
ficients,  3.H  =  S“^Xif  =  Q  X//. 

The  goal  is  to  nainimize  the  volume  of  5,  but  by  (5),  we  can  alternately  maximize  |Q|  by 
iteratively  adjusting  each  of  its  rows.  This  is  equivalent  to  adjusting  one  facet  of  the  simplex  by 
manipulating  its  surface  normal  while  keeping  the  others  fixed.  This  technique  for  maximizing  [Q] 
is  shown  to  be  equivalent  to  maximizing  the  product  of  a  constrained  set  of  variables,  and  numerical 
approaches  from  linear  programming  are  used  [9].  However,  in  the  more  general  situation,  the  dark 
point  is  unknown,  and  the  minimum  volume  simplex,  defined  by  S,  can  no  longer  assume  the  origin 
is  an  endmember,  requiring  S  to  explicitly  contain  an  additional  column.  Since  the  number  of 
vertices  in  a  simplex  exceeds  the  dimension  of  the  data,  M,  by  one,  the  new  matrix  of  endmembers, 
S,  is  now  M  X  (M  -h  1). 


Minimum  Volume  Transform:  Fixed-Point-Free  Transform  By  contrast,  in  the 
fixed-point-free  transform  (FPFT)  [9],  the  additional  endmember  is  accommodated  in  S  by  adding 
a  row  of  ones  to  S,  making  it  an  (M  -f  1)  x  (M  H- 1)  matrix,  Saup.  The  implication  of  is  that 
the  simplex,  5,  residing  in  3?^  is  embedded  within  a  simplex,  T,  in  having  one  vertex  at  the 

origin  and  the  others  at  co-ordinates  represented  by  the  columns  of  Saug-  The  volume  associated 
with  T  is  given  by 

volume(r)  =  -  volume(5)  =  v  ^ — -r  — (6) 

^  ^  M-hl  M(M-hl)  Ml 
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where  (5)  has  been  invoked.  So,  minimizing  \Saug  \  equivalently  minimizes  the  volume  of  S  using  the 
theory  of  determinants.  The  same  method  of  individually  modifying  each  of  the  surface  normals 
is  employed  for  the  FPFT  where  now  =  ^aug-  M  +  1  endmembers  for  the  simplex  of 
interest  in  3?^  can  be  obtained  from  the  columns  of  the  final  version  of  by  retaining  only  the 
first  M  rows. 

Penalty-Based  Shrinkwrapping  The  DPFT  and  FPFT  build  simphces  having  minimum 
volume.  Some  pixels  in  the  scene,  however,  may  reside  very  close  to  the  facets  of  the  simplex, 
causing  potential  instability  during  inversion.  One  method  for  avoiding  this  scenario  strives  to 
build  a  simplex  having  minimum  volume,  but  is  constrained  by  a  penalty  function  that  discomages 
facets  from  residing  too  close  to  data  points  [11]. 

Adopting  block  notation  for  N  pixels,  we  let  S  be  the  M  x  (M  + 1)  matrix  whose  columns  are 
endmembers.  S,  however,  is  not  square,  and,  hence,  is  not  invertible.  The  fractional  abundances  of 
each  pixel  are  constrained  to  be  positive  and  must  sum  to  1.  This  latter  constraint  can  be  appended 
to  the  block  formulation  of  (1),  yielding 


where  1  is  a  column  vector  of  ones,  and  0  is  a  column  vector  of  zeroes.  The  synthesis  equation 
can  be  expressed  as  X  =  SA  +  W.  An  estimate  of  A  can  be  recovered  by  a  simple  inversion, 
A  =  S^^X. 

The  objective  function,  i?,  involves  two  terms,  a  volume  term  and  a  penalty  term,  and  is 
defined  as 

H{S,  X)  =  V{S)  +  aF(S,  X).  (8) 

V{S)  is  the  volume  of  the  simplex,  calculated  using  (5)  and  (6).  F(S,X)  is  a  penalty  fimction 
enforcing  the  constraint  that  all  data  points,  X,  reside  inside  the  simplex.  The  parameter,  a, 
weighs  the  impact  of  the  penalty  on  ff(S,  X).  Intuitively,  F(S,  X)  should  be  small  when  the  facets 
of  the  simplex  are  far  from  the  points  in  X,  and  large  when  the  opposite  is  true. 

The  penalty  function  is  chosen  as 

F(s,x)  =  f;x;i.  (9) 

i=i  i=i 

In  order  to  determine  a  value  of  S  that  minimizes  i?(S,X),  the  gradient  of  (8)  with  respect  to 
the  columns  of  S  is  set  to  zero  and  solved  for  a  recursion  that  leads  to  a  final  estimate  for  S  that 
minimizes  i?(S,X). 

3.3.2  Statistical  and  Non-parametric  Endmember  Determination 

In  this  section,  we  consider  algorithms  that  do  not  assume  a  parametric  model  for  randomness, 
but  still  use  statistical  information  to  estimate  endmembers. 
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3.3.2. 1  Squared-error  In  this  section,  we  consider  non-parametric  algorithms  that  estimate 
endmembers  in  a  scene  grouped  loosely  around  the  concept  of  minimizing  a  squared-error  criterion. 
The  discussion  of  the  least  squares  problem  is  well-documented  [19].  Although  other  techniques, 
especially  those  involving  Gaussian  distributions,  ultimately  minimize  a  quadratic  metric,  we  defer 
the  discussion  of  parametric  models  to  later  sections. 


Block  Least  Squares  We  consider  the  problem  of  estimating  the  endmembers  in  a  scene 
when  both  spectral  observations  and  ground  class  abundance  measurements  are  collected.  In 
essence,  this  is  the  case  of  having  x  (measurements)  and  a  (groimd  truth)  for  a  set  of  pixels, 
and  estimating  S  in  (1).  Depending  on  how  the  sources  of  error  are  modeled,  two  different  least 
squares  estimators  can  be  derived  [15].  We  use  (1)  as  a  starting  point  and  adopt  block  notation  for 
N  pixels,  with  the  goal  of  developing  an  estimate  for  S  that  minimizes  jjX  -  SA|||..  Here,  ||  •  |||r  is 
the  Proebenius  norm  [31]. 


We  develop  a  least  squares  error  estimate  for  S  in  hght  of  the  fact  that  X  =  'X.exact  +  terror 
and  A  =  Aexact  +  terror-  Errors  in  both  values  may  occur  for  a  number  of  reasons.  Xerror  maybe 
an  instrumentation  error  corrupting  the  observation  in  X  and  Aerror  may  be  the  human  error  in 
assessing  groimd  class  proportions  distorting  the  measurement  in  A.  We  now  explore  different 
estimates  for  S  based  on  the  assumption  that  one  or  both  errors  are  present  and  assuming  that 


X, 


exact 


-SA, 


exact' 


First,  we  generate  a  least  squares  error  estimate  of  S  based  on  the  assumption  that  X  possesses 
only  an  observation  error,  in  which  case  X  =  Xezarf  +  Xerror  =  S  Aeiorf  +  W .  We  can  recover 
an  estimate  for  S  by  multiplying  both  sides  by  the  pseudo-inverse  [22]  of  A,  defined  as  A+  = 
(A”^A)“^A^.  This  leads  to  the  block  least  squares  estimate  for  the  endmembers,  Sbls  =  XA"*". 


Total  Block  Least  Squares  If  both  A  and  X  are  assumed  to  be  noisy,  then  X  =  Xexact  + 
Xerror  =  S Aexoct  +  S  Agrror  +  W.  We  Can  extend  the  analysis  for  Sbls  to  obtain  a  block  total  least 
squares  estimate,  Sbtls,  which  accounts  for  the  measmement  error  in  A.  We  can  rewrite  (1)  as 

[A^X^][S  -lf  =  C[S  -If  =  0  (10) 

where  C  =  [A^X^]  is  an  AT  x  (M  +  P)  matrix  whose  rows  are  the  concatenation  of  N  abimdance 
measurements  with  their  associated  pixel  spectriun  observations. 

Letting  the  singular-value  decomposition  (SVD)  [31]  of  C  =  USV^,  V  may  be  partitioned 
as 


Vn  Vi2 

V21  V22 


(11) 


where  Vu  is  P  x  P  and  V22  is  M  x  M.  It  can  be  shown  [15]  that  the  total  block  least  squares 
solution  for  S  is 


SsTiS  =  -(Vi2V22^)^. 


(12) 
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Fuzzy  K-Means  Partitions  Clustering  algorithms  attempt  to  identify  natural  partitions 
in  data  that  exemplify  distinct  statistical  behavior.  This  method  has  been  adapted  to  estimate  the 
endmembers  and  abundances  from  data  in  a  scene  so  that  a  global  metric  based  on  squared-error 
is  minimized  [10].  For  a  collection  of  N  pixels,  the  objective  function  may  be  expressed  as 

p  N 

jm{A,  s)  =  j2  (13) 

i=l 

where  Aij  is  an  estimate  of  the  i-th  abundance  for  the  j-th  pixel  and  dij^  the  weighted  squared-error 
between  the  j-th  pixel  and  the  i-th.  centroid,  or  endmember,  Sj,  is 

(dy)2  =  (xj-Si)^W(xj-Si).  (14) 

The  weighting  matrix,  W,  permits  certain  abundances  to  be  weighted  more  than  others.  The 
minimization  of  Jm(A,  S)  is  constrained  by  physical  restrictions  on  the  values  of  abimdances  (see 
Section  3.4).  These  requirements  are  inserted  into  the  estimator  for  A.  As  noted,  m  indicates  the 
degree  of  fuzziness,  and  the  shapes  of  the  regions  associated  with  each  class  centroid  vary  as  m  is 
changed.  Suggested  values  for  m  fall  in  the  range,  1.5  <  m  <  3.0.  The  optimization  of  Jm{A^S) 
is  accomphshed  iteratively  using  a  variant  of  the  FC-means  clustering  algorithm  [2],  until  Jm(A,  S) 
achieves  a  minimum  using  a  final  estimate  of  the  endmembers,  S,  and  abundances,  A. 

3,3.3  Statistical  and  Parametric  Endmember  Determination 

In  this  section,  we  consider  approaches  to  determining  endmembers  that  utilize  statistics  and 
assume  an  imderlying,  parametric  model  for  the  received  pixel  spectra. 


3.3.3. 1  Maximum  Likelihood  Estimation  To  derive  a  maximum  likelihood  (ML)  estimate 
of  a  deterministic  variable,  a  forward  density  for  the  received  data  is  maximized.  In  this  section 
we  explore  ML  algorithms  for  estimating  endmembers.  However,  while  these  algorithms  estimate 
non-random  parameters,  this  does  not  necessarily  mean  the  endmembers  themselves  are  determin¬ 
istic.  Stochastic  endmembers  generalize  the  concept  of  deterministic  spectra  to  random  spectra 
having  parametric,  multivariate  representations.  So  even  if  the  endmembers  are  stochastic,  an  ML 
algorithm  can  still  estimate  the  deterministic  parameters  of  the  endmember  densities. 


Non-linear  Least  Squares  The  nnmixing  problem  can  be  formulated  as  an  inverse  prob¬ 
lem  [32]  whose  objective  is  to  estimate  system  parameters  from  observations.  By  virtue  of  the 
assumptions  of  linear  mixing  and  Gaussianity,  the  problem  can  be  reduced  to  that  of  ML  estima¬ 
tion  using  a  quadratic  objective  fimction  containing  the  unknown,  deterministic,  system  parameters. 
However,  when  both  endmembers  and  abundances  are  being  estimated  simultaneously  in  the  linear 
mixing  model,  the  minimization  of  quadratic  error  results  in  a  non-hnear  least  squares  formulation. 
Starting  with  the  general  formulation  [32],  this  concept  has  been  adapted  to  an  iterative  algorithm 
that  is  initialized  by  an  a  priori  estimate  for  endmembers  and  abundances  and  simultaneously 
derives  an  ML  estimate  for  endmembers  and  fractional  abimdances  [33]. 
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The  elements  of  S  and  a  are  unknown,  deterministic  parameters  that  we  combine  in  a  single 
{MP  4-  P)  X  1  Gaussian  random  vector,  g  =  [5ii, . . . ,  SmPj  Then,  the  following  recursion  for 
g  is  given  by 

g(fe+i)  =  +  r-^x  -  s«aW)  - 

(15) 

g(^)  is  the  fc-th  estimate  of  g  which  contains  and  g(^)  is  the  initial  guess  for  g,  and  r^(o)^(o) 
is  its  covariance  matrix  under  the  assumption  that  g^^^  is  Gaussian,  x  is  the  received  pixel  spectra 
having  as  its  covariance  which  is  equal  to  the  additive  noise  covariance.  is  the  constraint 
matrix  at  the  A:-th  iteration. 

A  priori  estimates  of  mixing  coefficients  and  endmembers  can  originate  from  previous  exper¬ 
iments  or  from  laboratory  measurement,  and  the  degree  of  confidence  in  these  initial  estimates  is 
specified  in  their  respective  covariance  estimates  which  comprise  r^(o)^(o).  All  pixels  in  an  image 
are  modeled  simultaneously,  and  consequently,  the  number  of  mixing  coefficients  being  solved  for 
is  the  munber  of  total  pixels  multiphed  by  P.  The  iterative  algorithm  in  (15)  is  repeated  until  the 
difference  between  successive  estimates  falls  below  a  prescribed  threshold.  Clearly,  consists 

of  an  adjustment  to  g^^^  based  on  the  weighted  sum  of  the  parameter  vector  error, (g^^)  — 
and  the  spectrum  error,  (x  —  In  the  case  of  both  errors,  the  difference  is  measured  with 

reference  to  the  starting  values  in  not  the  previous  value  of  g.  The  implication  here  is  that 

dramatic  changes  in  estimates,  when  gauged  against  the  values  in  g,  are  inversely  weighted  by  the 
associated  confidence  in  the  original  estimates  appearing  in  r^(o)^(o). 


Gaussian  Class  Estimation  An  adaptation  of  the  linear  mixing  model  combines  the  ge¬ 
ometric  approach  in  Section  3.3. 1.1  with  stochastic  data  clustering  approaches  [30].  The  stochastic 
mixing  model  (SMM)  introduces  the  concept  of  hard  endmember  classes  as  the  stochastic  extension 
of  deterministic  endmembers  and  assumes  that  all  data  in  a  scene  is  Gaussian  and  arises  from  a 
hnear  combination  of  at  least  one  hard  class.  Geometrically,  each  stochastic  endmember  describes 
a  cluster  of  pixels  on  the  perimeter  of  the  data  cloud,  and  each  pixel  should  belong  to  a  partic¬ 
ular  combination  of  hard  classes.  If  each  fractional  combination  of  hard  classes  is  a  mixed  class, 
with  its  own  Gaussian  statistics  -  mean,  covariance,  and  prior  probability  -  then  a  pixel  may  be 
iinmixed  by  determining  which  combination  of  classes  has  the  greatest  hkehhood  of  realizing  that 
pixel.  After  an  initial  partition  of  the  data,  the  newly  re-classified  pixels  are  used  to  generate  new 
class  statistics  until  the  associated  SMM  class  parameters  no  longer  change.  This  procedure  is  the 
stochastic  equivalent  of  the  expectation  maximization  (EM)  algorithm  for  ML  estimation  [23] . 

We  can  demonstrate  this  concept  for  two  hard  classes.  Let  a  scene  be  comprised  of  two  hard 
classes,  Xi  ~  N{fii,Ti)  and  X2  N{fjt2,  ^2)^  each  having  a  mean  and  covariance  that  describes  the 
M-dimensional  Gaussian  density  function  for  pixels  belonging  to  that  class:  Then,  a  mixed  pixel 
is  a  linear  combination  of  observations  arising  from  the  two  hard  classes  so  that 

X  =  /xi  +  (1  -  /)X2  (16) 

and,  consequently,  for  a  fixed  value  of  /,  the  corresponding  statistics  are  (if  =  fni  +  (1  -  /)/i2  and 
Tf  —  +  (1  —  /)^r2  when  xi  and  X2  are  uncorrelated.  Since  /  lies  in  the  interval,  [0, 1],  an 
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infinite  number  of  soft  classes  exists,  but  if  /  is  discretized  on  [0, 1],  a  finite  number  of  soft  classes 
is  attained,  each  described  by  the  resulting  Gaussian  statistics.  If  the  number  of  hard  classes,  C/^, 
is  greater  than  2,  /  must  be  a  vector  of  length  —  1,  whose  elements  are  discretized,  describing 
the  contribution  from  the  Ck  hard  classes.  If  C  is  the  total  number  of  hard  and  mixed  classes,  then 
each  class,  c  =  1, . . . ,  C,  has  a  mean,  covariance,  and  prior  probabihty,  (/Xc,  Tc,  He). 

The  number  of  hard  classes,  Ch,  is  manually  estimated  before  iterating  by  matching  pixel 
regions  in  the  scene  with  reference  spectra.  Given  a  discretization  factor  for  the  fractional  abun¬ 
dance  of  each  hard  class,  a  lattice  of  acceptable  soft  classes  is  created  to  represent  the  space  of  all 
possible  mixing  coefficients.  The  iterative  estimation  of  class  parameters  (/^c?  Tc?  He)  is  a  maximum 
likelihood  (ML)  search  of  class  statistics  that  is  performed  by  employing  a  stochastic  variation 
on  the  traditional  Expectation-Maximization  (EM)  algorithm  [23].  The  class  parameters  are  re¬ 
fined  after  each  round  of  reclassification  until  successive  iterations  yield  negligible  changes,  and  the 
abimdances  for  each  pixel  are  ascertained  from  the  mixture  describing  the  nearest  class. 


Independent  Component  Anedysis  The  linear  mixing  model  predicts  that  the  signal 
received  by  a  sensor  from  a  scene  is  the  sum  of  spectra  from  distinct  substances.  The  substances 
may  be  man-made  or  natural,  and  without  prior  knowledge,  the  challenge  of  unmixing  is  to  identify 
the  spectra  of  the  components  and  the  associated  abundances.  A  key  assumption  is  that  the  spectra 
are  statistically  independent  of  each  other  with  the  physical  significance  being  that  the  presence 
of  one  substance  in  a  mixed  pixel  should  not  interfere  with  the  scattering  of  another,  regardless 
of  the  quantities  in  which  they  appear.  This  is  a  significantly  stronger  condition  than  the  more 
traditional  objective  of  decorrelation  present  in  Gaiissian  analysis. 

Beyond  decorrelation,  independence  means  that  the  parametric  densities  for  the  spectra  of 
substances  are  separable.  If  Pxi(xi)  and  Px2(x2)  are  the  densities  from  two  distinct  substances, 
then  independence  implies 


Pxi  ,X2  (^1  ?  ^2  )  —  Pxi  (xi  )Px2  (^2  )  •  (  ^ 

Independent  component  analysis  (IC A)  is  recognized  as  a  special  case  of  the  blind  source  separation 
problem  where  the  objective  is  to  estimate  the  parameters  of  statistically  independent  sources,  s, 
that  combine  through  a  mixing  matrix,  W,  to  form  an  array  of  observations,  x,  collected  at  the 
sensor.  Considered  as  a  time  series,  each  element  of  the  array  possesses  its  own  statistics,  and 
under  the  assumption  of  linear  mixing,  they  can  be  used  to  estimate  the  mixing  coefficients  and 
the  parameters  of  the  source  distributions  so  that  the  statistical  independence  of  the  sources  is 
maximized. 

In  the  adaptation  of  ICA  to  spectral  unmixing,  however,  we  assxune  endmembers  are  deter¬ 
ministic;  but,  we  can  consider  a  collection  of  similar  mixed  pixels  to  be  multiple  occurrences  of 
mixing  by  the  same  independent  sources,  or  endmembers,  that  obey  x  =  Ws.  In  lieu  of  a  time 
series  to  derive  statistics,  the  spectral  values  for  each  mixed  pixel  are  converted  to  a  histogram. 
Likewise,  a  similarly  sjmthetic  histogram  for  each  of  the  sources  is  posited  from  the  unknown  end- 
member  spectra.  Based  on  an  assumed  parametric  form  for  the  cumulative  distribution  functions 
for  these  synthetic  densities,  ICA  estimates  the  shape  of  the  source  distribution  functions  that 
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maximize  the  independence  of  the  endmember  histograms,  and  still  combine  linearly  to  yield  the 
received  histograms.  In  order  to  explicitly  identify  the  endmembers  in  the  scene,  these  histograms 
are  then  correlated  with  distribution  functions  derived  from  histograms  of  reference  spectra  [1]. 

To  maximize  the  independence  of  the  sources,  the  KuUback-Leibler  distance,  G,  compares  the 
synthetic  histogram  observed  at  the  sensor  with  the  one  derived  from  the  estimated  linear  mixtmre 
of  independent  sources. 

G(px(x),Px(x;w,W))  =  y'px(x)log^-^^^^dx  =  ff(x)-ypx(x)logj6x(x;w,W)dx.  (18) 

No  additive  noise  is  considered  in  this  model.  Here,  Px(x)  is  the  synthetic  histogram  for  x  that  is 
manufactured  empirically  from  the  received  spectra.  The  histogram  for  the  linear  model  estimate 
of  X  is  Px(x;  w),  where  w  is  a  vector  containing  the  parameters  defining  the  densities  of  the  sources 
as  well  as  the  mixing  coeflBcients,  W.  To  minimize  (18),  partial  derivatives  with  respect  to  the 
elements  of  w  yield  a  learning  rule  for  a  neural  network  to  arrive  at  an  ML  estimate  for  S. 


3.4  INVERSION  TAXONOMY 

In  this  section  we  examine  numerous  algorithms  that  yield  fractional  abundances,  a,  from  a 
mixed  pixel  spectrum,  x.  Figure  5  illustrates  the  taxonomy  of  these  inversion  algorithms.  Inver¬ 
sion  is  essentially  one  form  of  the  classical  parameter  estimation  problem  [35].  Given  a  vector  of 
observations,  x,  and  a  signal  model,  we  wish  to  estimate  a  vector  of  parameters,  a.  As  in  Sections 
3.2  and  3.3,  we  organize  the  techniques  in  this  section  according  to  the  featmres  in  Figure  2.  To 
add  further  structure,  two  specific  features  have  been  included:  1)  the  type  of  values  estimated 
abundances  may  have,  2)  the  type  of  endmembers  required,  and  3)  the  kind  of  noise,  if  any,  in  the 
signal  model. 

Any  meaningful  estimate  of  a,  however,  must  comply  with  constraints  that  make  it  physically 
realizable.  In  fact,  the  single  most  chaUenging  aspect  of  unmixing  is  determining  how  to  reconcile 
mathematical  and  statistical  techniques  with  the  underlying  physical  restrictions.  With  this  in 
mind,  any  estimate  of  a  must  obey  the  following  constraints: 


Non-negativity  The  abundances  should  be  non-negative  to  be  meaningful  in  a  physical  sense: 

flj  >  0, z  =  1, . . .  ,P. 

Purity  A  fractional  abtmdance  coefficient  should  not  exceed  1:  Uj  <  1,  z  =  1, . . . ,  P. 

Ftill/Partial  additivity  Full  additivity  requires  the  abundances  for  a  mixed  pixel  to  sum  to  1, 
with  the  implicit  assumption  that  aU  the  endmembers  comprising  the  pixel  spectrum  in  x 
are  present  in  the  columns  of  S:  ai  —  l.  Partial  additivity  is  a  generalization  that  only 

requires  the  sum  of  abundances  to  be  less  than  or  equal  to  one,  and  it  applies  when  the  set 
of  endmembers  in  the  scene  might  be  incomplete:  —  1- 
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Figure  5.  Taxonomy  of  inversion  algorithms. 
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3.4.1  Non-statistical  and  Non-parametric  Inversion 

In  this  section,  we  examine  inversion  algorithms  that  are  neither  statistical  nor  presume  a 
parametric  formulation  for  the  data.  The  abimdances  are  considered  to  be  deterministic. 


3.4.1. 1  Non-squared-error  We  now  examine  methods  for  inversion  that  do  not  rely  on  squared- 
error,  but,  instead,  explore  practical  aspects  of  inversion  while  addressing  some  or  all  of  the  physical 
constraints. 


Filter  Vector  Algorithm  The  Filter  Vector  Algorithm  (FVA)  [6]  is  a  module  of  ORASIS 
that  performs  inversion  by  designing  an  ensemble  of  filters  that  are  in  one-to-one  correspondence 
with  each  endmember  in  S.  Crucial  to  producing  this  result  is  that  the  colmnns  of  S  must  be 
linearly  independent.  When  the  spectrum  of  a  mixed  pixel  is  correlated  by  one  filter,  the  output  is 
an  estimate  of  the  fractional  abundance  belonging  to  the  corresponding  endmember.  Thus,  if  F  is 
an  M  X  P  matrix  whole  columns  are  filters,  two  desirable  properties  are  F^S  =  Ip  and  F^j  =  0, 
where  j  is  an  M  x  1  vector  of  ones.  The  first  condition  requires  the  filters  to  be  orthogonal  to  all 
endmembers  except  one.  The  second  condition  requires  that  each  filter  be  zero-mean,  i.e,  the  filters 
should  reject  all  DC  terms  in  the  pixel  being  munixed  and  is  useful  when  w  in  (1)  has  a  non-zero 
mean.  The  solution  to  F  can  be  shown  to  be 

F  =  S((S^S)-^)^  (19) 

where  F  requires  the  mean-removed  matrix  of  endmembers,  S  =  (Im  “  J  is  a  matrix 

of  ones. 


I-Divergence  As  stated  earher,  a  key  to  unmixing  is  findiug  mathematical  structures 
amenable  to  the  physical  limitations  inherent  to  the  problem.  Mixed  pixel  spectra  are  non-negative, 
and,  consequently,  a  distance  measure  amenable  to  comparing  non-negative  data  is  the  I-divergence 
which  compares  two  non-negative  deterministic  functions  and  yields  a  non-negative  measure  of  dis¬ 
similarity  having  a  value  of  zero  when  the  functions  are  identical  [28].  For  two  non-negative  vectors 
of  length  M,  x(l)  and  x(2),  the  I-divergence  is  defined  by 


M 


7(x(l),  x(2))  =  I  I  -  -  "’iP)) 


La;i(2) 


M 


i=l 


(20) 


Using  the  block  notation  for  N  pixels  and  given  X  and  S,  the  inversion  algorithm  selects  the 
elements  of  A  for  the  N  pixel  spectra  in  X  so  that,  based  on  (20),  they  minimize 


N  M 

/(A)  =  /  (xj,  AjjSj).  (21) 

j=l  i=l 

Imposed  on  the  colmnns  of  A  during  the  optimization  is  the  full  additivity  constraint.  1(A)  is 
minimized  by  a  recursion  that  generates  successive  values  for  the  entries  of  A.  Although  the 
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Figure  6.  Confidence  measurement  for  b0%-50%  mixture  (triangles)  and  30%-70%  (circles)  mixture. 


algorithm  does  not  incorporate  a  statistical  noise  model,  each  estimate  for  A  upholds  the  non¬ 
negativity,  purity,  and  full  additivity  constraints. 


Confidence  Estimates  Most  inversion  algorithms  utilize  one  endmember  to  represent  the 
spectrum  of  a  particular  substance,  or  class.  Depending  on  the  observation  conditions,  however, 
one  class  may  require  multiple  spectra  to  satisfactorily  capture  its  variabihty.  The  possibility  of 
intra-class  variability,  and  its  impact  on  inversion,  is  addressed  for  two-class  problems  using  a 
non-parametric  technique  that  estimates  abtmdances  based  on  the  geometric  proximity  of  spectral 
observations  to  the  spectrum  belonging  to  each  class  [25]. 

This  approach  is  best  illustrated  graphically.  In  Figure  6,  consider  the  pixel  to  be  unmixed, 
denoted  by  a  filled  square,  using  two  classes.  The  various  instantiations  of  each  class  are  denoted 
by  A  asterisks  and  B  pluses,  respectively.  The  mixed  pixel  may  then  be  a  linear  combination  of 
any  pair  of  observations  arising  from  the  two  classes.  A  line  between  a  pair  indicates  the  locus  of 
values  that  a  mixed  pixel  may  have,  and  the  location  on  the  line  is  indicative  of  the  percentage 
of  each  component.  For  instance,  the  triangles  indicate  positions  on  the  lines  having  50%  of  each 
class,  whereas  circles  indicate  a  30%-70%  relationship.  If  a  circle  of  appropriate  size  is  drawn 
around  aroimd  the  pixel  to  be  unmixed,  confidence  estimates  may  be  derived  for  each  possible 
mixing  proportion  by  counting  the  number  of  occurrences  falling  within  the  circle  and  dividing  by 
the  total  number  of  possible  model  mixtures,  AB.  Based  on  these  ratios,  a  pixel  is  assigned  the 
abundances  yielding  the  highest  confidence  estimate. 

Abundance  estimates  based  on  this  technique  automatically  obey  the  non-negativity,  purity, 
and  full  additivity  constraints.  Since  an  infinite  number  of  confidence  estimates  can  be  generated,  a 
discretized  lattice  of  abundance  estimates  most  likely  would  be  used  to  obtain  confidence  estimates. 


3.4.1.2  Squared-error  We  now  investigate  solutions  that  are  based  on  the  method  of  least 
squares.  Building  upon  the  unconstrained  least  squares  estimate,  inversion  algorithms  striving  to 
minimize  squared-error  address  physical  constraints  detailed  earlier  as  well  as  realistic  limitations 
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that  arise  from  practical  scenarios.  Computational  approaches  for  minimizing  quadratic  cost  func- 
tions  with  linear  constraints  fall  into  the  realm  of  quadratic  programming  and  are  not  discussed 
here. 


Unconstrained  Least  Squares  The  theory  of  least  squares  yields  an  estimate,  a,  that 
minimizes  the  total  squared-error  between  x  and  Sa,  a  =  mina  |x  —  Sap.  The  unconstrained  least 
squares  estimate,  a^,  is  given  by  [31] 

=  (S^S)-^S^x.  (22) 

Implicit  in  this  estimate  are  two  assumptions.  First,  M  >  P,  which  is  typical  in  hyperspectral 
systems.  Second,  S  is  an  M  x  P  matrix  with  full  column  rank  so  that  (S^S)““^  exists. 


Full  Additivity  The  addition  of  the  full  additivity  constraint  to  the  least  squares  problem 
restricts  the  optimal  estimate  of  a  to  reside  on  a  hyperplane  (a  surface  whose  dimensionality  is  one 
less  than  the  signal  dimension).  This  is  stated  formally  as 

p 

a^  —  min  |x  -  Sap,  ^  =  1.  (23) 

^  i=:l 

The  general  least  squares  estimate  with  hnear  constraints  is  derived  using  Lagrange  multiphers  to 
be  [16] 

-  (S^S)-iZ^[Z(S^S)-^Z^]-^(ZaU  -  b).  (24) 

a^  is  the  least  squares  estimate  of  the  abundances  when  the  full  additivity  constraint  is  enforced, 
Z  is  an  P  X  P  matrix  and  b  is  an  P  x  1  vector  that  together  convey  the  P  hnearly  independent 
equations,  Za^  =  b,  that  serve  as  hnear  constraints  on  the  solution  of  (1).  To  enforce  full  additivity, 
Z  is  a  1  X  P  row  vector  whose  entries  are  aU  ones,  and  b  =  1.  Examining  (24)  reveals  that  the 
solution  enforcing  full  additivity  consists  of  the  imconstrained  LSE  solution,  a^,  with  an  additive 
correction  term  that  depends  on  the  matrix  of  endmembers,  S,  and  the  error  incurred  by  the 
imconstrained  solution,  a^,  in  satisfying  the  full  additivity  constraint. 


Projection  on  Convex  Sets  One  problem  with  the  unconstrained  least  squares  estimate 
in  (22)  happens  when  S  does  not  possess  full  column  rank.  In  hyperspectral  sensing,  this  occurs 
when  the  endmembers  are  hnearly  dependent,  a  situation  that  is  encoimtered  when  pixels  are 
inverted  using  spectrally  similar  endmembers. 

This  problem  is  addressed  by  defining  two  convex  sets,  one  based  on  S  and  the  other  by  the 
non-negativity  constraint,  and  by  recognizing  that  any  a  which  solves  (1)  must  necessarily  obey  two 
properties  [34].  First,  the  entries  of  a  must  be  non-negative  to  be  physicaUy  meaningful.  Second, 
any  valid  a  should  be  expressible  as  a  sum  of  two  components,  a  =  a^  4-  a^.  The  first  term,  a^ 
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resides  in  the  column  space  of  S,  and  resides  in  its  null  space.  The  non-negativity  constraint  on 
a,  can  be  imposed  by  adding  a  requirement  on  a’^. 

af  >  -af,  l<i<P.  (25) 

The  objective  is  to  find  a  value  for  a^  that  resides  in  the  null  space  of  S  -  this  assures  that  Sa’^ 
contributes  nothing  to  the  mixed  pixel  spectrum  -  but  also  resides  in  the  convex  space  of  all  possible 
vectors  having  entries  greater  than  the  negative  of  the  corresponding  entries  in  a^,  assuring  the 
non-negativity  of  the  entries  in  a. 

An  iterative  algorithm  determines  a^  so  that  it  resides  in  the  intersection  of  the  convex 
set  defined  by  the  null  space  of  S  and  the  convex  set  of  values  satisfying  (25).  The  algorithm 
alternately  projects  solutions  for  a  derived  from  one  convex  set  onto  the  other  convex  set  until  a 
standard  stopping  criterion  is  met.  Successive  values  for  a’^  converge  in  the  intersection  of  the  two 
convex  sets. 

3.4.2  Statistical  and  Non-parametric  Inversion 

In  this  section  we  consider  inversion  algorithms  that  utilize  statistics  to  invert  pixels  but  do 
not  rely  on  parametric  expressions  to  model  their  random  behavior. 


3.4.2. 1  Squared-error  We  examine  several  inversion  algorithms  in  this  section  that  minimize 
squared-error. 


Minimiun  Variance  Unbiased  Estimator  If  w  is  a  zero-mean  random  process  in  (1) 
and  has  a  covariance,  then  the  minimum  variance  estimate,  a^,  is  defined  as 

aV  =  {S'^TZ^Sy'^S'^T-^x.  (26) 

This  solution  is  based  on  the  weighted  least  squares  approach  [16]  and  resembles  a^  in  (22),  with 
the  common  caveat  that  S  must  have  full  column  rank.  Incorporating  second-order  statistics  of  the 
random  noise,  w,  minimizes  (x—  Sa)^r”^(x  —  Sa)  rather  than  (x  —  Sa)^(X“  Sa).  Because  the 
imbiased  estimator  in  (26)  is  the  best  linear  imbiased  estimator  (BLUE)  it  is  called  the  minimum 
variance  unbiased  estimator  (MVUE). 


Ground-truth-based  Estimators  Ground  truth  consists  of  a  set  of  pixel  spectra  from  a 
scene  for  which  the  fractional  abundances  are  known,  or  estimates  exist,  and  an  estimator  designed 
for  optimality  on  this  training  set  may  then  be  used  to  invert  other  mixed  pixels  in  the  scene  with 
the  advantage  of  deriving  estimates  without  knowledge  of  the  actual  endmember  spectra  in  the 
scene.  Two  estimators  based  on  this  concept,  each  optimizing  a  different  quantity  in  the  ground 
truth  region,  have  been  proposed  to  estimate  abimdances  [27]. 

Before  comparing  the  two  techniques,  we  introduce  nomenclature  that  is  useful  to  both. 
Adopting  block  notation  for  N  training  pixels,  let  the  estimate  for  the  abundance  correlation  ma¬ 
trix  be  given  by  R  =  ;^AA^,  let  the  estimate  of  the  endmembers  be  given  by  S  =  (;^XA^)R”^, 
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and  let  the  noise  covariance  estimate  be  defined  as  =  ^(X  —  SA)(X  —  SA)^  =  ^XX^  — 
(;^XA^)R~^(;^AX^).  We  can  also  denote  the  average  pixel  spectra  by  x  and  the  average  abxm- 
dance  in  the  training  set  by  a.  Also,  let  j  be  a  P  x  1  vector  of  ones,  and  let  J  = 

The  classical  estimator  postulates  a  physical  model  (the  linear  mixture  model)  for  the 
S5Tithesis  of  a  mixed  pixel,  estimates  the  parameters  of  this  model  from  the  training  set,  then 
inverts  the  model  to  estimate  abtmdances  for  the  remaining  image  pixels.  It  minimizes 

trace[  (X  -  SA^)^r-^  (X  -  SA^)  ]  (27) 

which  is  the  siim,  over  all  N  training  pixels,  of  the  weighted  squared-error  between  X  and  SA^. 
The  classical  estimator,  aP,  is  given  by 

a^  =  a  +  E^S^f (x  -  x).  (28) 

Note  that  while  a  is  the  average  fractional  abundance  in  the  training  set,  it  is  constrained  to  uphold 
the  full  additivity  constraint,  i.e.,  the  sum  of  the  entries  in  a  equals  one.  is  the  prediction  error 
matrix  for  the  classical  estimator  and  is  defined  as  E^  =  U  -  aUJU,  where  U  =  (S^f^^S)"^  and 

a  =  (Fuj)-^ 

The  inverse  estimator  generates  an  estimate  for  a  that  exemplifies  the  precepts  of  multi¬ 
variate  regression.  Whereas  the  classical  estimator  sought  to  minimize  the  weighted  squared-error 
between  X  and  SA^,  the  inverse  estimator  minimizes 

trace[  (A^  -  A)^(aI  -  A)  ]  (29) 

which  is  the  squared-error  between  A  and  A-^.  The  solution  for  a-*-  is  given  by 

=  a  +  E^S^f (x  -  x)  (30) 

where,  if  C  =  Q  — QJQ,  E^  =  C(C-I-U)~^U.  Reviewing  (28)  and  (30)  reveals  that  both  estimators 
only  differ  by  their  respective  prediction  error  matrices,  E^  and  E^,  whose  the  diagonal  elements 
provide  the  mean  squared  prediction  errors  for  the  associated  ground  class  type. 

3.4.3  Statistical  and  Parametric  Inversion 

In  this  section  we  consider  inversion  algorithms  that  utilize  statistical,  parametric  represen¬ 
tations  of  the  received  data  to  invert  mixed  pixels.  The  principle  distinction  between  algorithms 
in  the  section  arises  from  the  assumption  of  randomness  for  abundances.  ML  techniques  presume 
the  abundances  are  deterministic  quantities,  while  MAP  techniques  assume  they  axe  random,  ne¬ 
cessitating  a  prior  density  for  the  abundances. 


3.4.3. 1  Maximum  Likelihood  We  consider  inversion  algorithms  in  this  section  that  assume 
the  received  spectra  obey  a  parametric  model  but  have  deterministic  abundances.  In  all  cases,  each 
algorithm  maximizes  a  forward  density,  Px|a(^l^)5  to  obtain  an  ML  estimate  for  the  a. 
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Gaussian  Minimum  Variance  Unbiased  Estimate  The  performance  of  the  MVUE  in 
Section  3.4.2. 1  can  be  further  quantified  if  the  noise,  w,  in  (1)  is  zero-mean  and  also  Gaussian. 
In  this  case,  the  estimator,  a^^,  equals  a^  in  (26).  In  addition  to  the  previously  emnnerated 
properties,  a^^  also  achieves  the  Cramer-Rao  lower  bound  for  imbiased  estimators. 


Weighted  Mahalanobis  Distance  When  the  additive  noise,  w,  in  (1)  is  Gaussian,  an 
ML  estimator  finds  a  that  maximizes  a  Gaussian  forward  density,  Px|a(x|a).  This  is  equivalent  to 
TTiiniTniTiiTig  the  Mahalanobis  distance,  which  is  defined  as  (x  —  Sa)^r“^(x  —  Sa). 

On  the  other  hand,  the  uncertainty  in  x  introduced  by  w  may  be  alternately  interpreted  as 
the  uncertainty  in  the  endmember  spectra  in  S  [18].  This  alternative  to  the  signal  model  in  (1) 
is  expressed  as  x  =  (S  4-  E)a,  where  EF  is  an  M  x  P  matrix  whose  columns,  ei,i  =  1, . . .  ,P, 
are  M-dimensional  Gaussian,  vectors  having  mean,  /itsj,  and  covariance,  The  covariance  for 
Px|a(xla),  Fxiai  is  now  a  sinn  of  covariances  and  cross-covariances,  each  weighted  by  the  associated 
firactional  abundances,  F^ja  =  O’iO'j'^SiSy  The  dependence  of  F^ia  on  a  makes  an  ML 

optimization  with  respect  to  a  considerably  more  comphcated  than  the  standard  minimization 
of  the  Mahalanobis  distance.  The  log-likelihood  function  to  be  minimized,  A(x),  derived  firom 

Px|a(xla),  is 

A(x)  =  In  IFxial  +  (x  -  Sa)’’F;ijx  -  Sa).  (31) 

Invoking  the  full  additivity  constraint  to  reduce  the  number  of  variables  by  one,  the  mini¬ 
mization  of  (31)  involves  finding  a  (P  —  l)-dimensional  subspace  in  that  contains  x  as  well  as 
a  projection  of  Sfc,  fc  =  1, . . . ,  P.  The  goal  is  to  iteratively  minimize  the  metric  in  (31)  by  param¬ 
eterizing  all  possible  P  —  1  subspaces  and  finding  the  best  family  of  basis  vectors.  The  method, 
however,  does  not  inherently  enforce  the  non-negativity  and  purity  constraints. 


3.4.3.2  Maximum  A  Posteriori  In  this  section,  we  depart  from  the  assumption  that  the 
abundances  are  deterministic  quantities.  Several  ML  algorithms  resulted  in  optimal  solutions  when 
w  is  Gaussian  and  a  is  assmned  to  be  deterministic.  If,  however,  the  abundances  are  now  considered 
to  be  random,  a  maximum  a  posteriori  (MAP)  formulation  may  be  employed  to  unmix  pixels.  The 
general  form  of  this  estimator  is  denoted  by 


m^Pa|x(a|x)  =  maxp,,|a(xla) 


Pa(a) 

Px(x)' 


(32) 


Invoking  a  standard  assmnption,  we  can  ignore  Px(x)  since  it  has  no  dependence  on  a.  The 
construction  ofpa(a),  however,  is  critical  in  the  following  algorithms,  as  well  as  in  Bayesian  inference 
in  general. 


Fuzzy  Membership  We  consider  an  algorithm  for  inverting  mixed  pixels  that  is  a  general¬ 
ization  of  a  classification  technique  that  permits  a  pixel  to  have  fractional  membership  in  mxiltiple 
classes  [17].  Appropriately  constrained,  the  fractional  memberships  are  construed  as  abundance 
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coefficients,  and  the  Bayesian  model  in  (32)  is  employed  to  estimate  a.  The  approach  assumes  that 
the  abundance  vector,  a,  is  random  and  Gaussian,  and  the  columns  of  S  are  the  average  spectra 
for  P  distinct,  known  classes. 

For  a  system  of  P  abundances,  the  dimensionality  of  the  solution  space  for  a  is  reduced  from 
P  to  (P  -  1)  by  imposing  the  full  additivity  constraint.  Accordingly,  a,  is  transformed  to  a  new 
(M  ~  1)  X  1  vector,  v,  through  one  of  two  possible  differencing  transformations.  The  resulting 
distributions  for  both  evaluations  of  v  are  approximated  by  the  quasi-intrinsic  Gaussian  condi¬ 
tional  autoregression  (QICAR)  which  asserts  that  the  density  of  v  is  equivalent  to  the  difference 
between  Gaussian-distributed  membership  vectors  in  a  symmetric  neighborhood  around  the  pixel 
in  question.  In  essence,  the  differenced  abundance  vector,  v,  for  a  single  pixel  is  approximated  by 
differences  in  class  membership  for  a  surrounding  neighborhood  of  pixels.  Then,  since  conditioning 
on  a  is  equivalent  to  conditioning  on  v,  the  Bayesian  formulation  in  (32)  can  be  optimized  with  v 
in  place  of  a. 

The  derivative  of  (32)  with  respect  to  v  is  performed  using  the  Gaussian  QICAR  distributions 
for  the  prior.  The  resulting  expressions  can  be  reworked  to  yield  an  iterative  structure  which 
successively  estimates  v,  from  which  a  can  be  obtained.  However,  the  enforcement  of  the  full 
additivity  constraint  does  not  preclude  values  of  a  from  being  negative  or  greater  than  one.  This 
is  corrected  by  projecting  a  to  the  nearest  point  in  the  space  of  acceptable  values. 

Log-Odds  Transformation  Enforcing  the  non-negativity  and  purity  constraints  requires 
a  constrained  optimization  of  (32).  This  problem  is  addressed  by  returning  the  constrained  opti¬ 
mization  to  an  imconstrained  one  by  warping  the  parameter  space  so  that  techniques  intended  for 
unconstrained  optimizations  may  be  used  [29].  The  approach  assumes  w  in  (1)  has  a  multivariate 
Gaussian  distribution.  Acknowledging  uncertainty  in  the  exact  values  in  (1),  the  full  additivity  con¬ 
straint  is  relaxed,  and  Pai^i)  is  modeled  as  a  imivariate  Gaussian  distribution  with  a  small  variance, 
describing  the  sum  of  the  abundances  as 

Pa  (a)  oc  exp  )  (33) 

where  /i(a)  =  Note  that  ranges  from  (—00,00),  which  contradicts  the  non-negativity 

and  purity  constraints  on  the  abundances.  This  will  be  addressed  shortly. 

The  posterior  density  can  then  be  expressed  as 

Pa|x(a|x)  (X  exp(-  -1^-  -  (x  -  Sa)’’E~^(x  -  Sa))  (34) 

where  the  optimal  solution  for  a  maximizes  this  posterior  density.  However,  by  (34),  the  optimal 
value  for  a  may  reside  outside  of  [0, 1].  This  possibihty  was  introduced  by  Gaussian  densities  having 
non-zero  probabilities  for  all  values  of  Instead,  a  desirable  operation  would  apply  a  reversible 
transformation  to  a,  transforming  the  interval,  [0, 1],  to  [—00,00].  Such  a  warping  of  the  parameter 
space  would  return  the  problem  to  an  unconstrained  optimization  in  the  transformed  space. 


(35) 


Let  the  transformation  defining  a%  be  defined  as 


4  =  log(-^),  k  =  l,...,P. 

I  —  dk 


dk 


If  a*  admits  a  Gaussian  distribution,  then  the  posterior  density  for  a*  can  be  written  as 

p,.|,(a*|x)  cx  -  (x  -  S*a*)’"S-'(x  -  S*a*)). 

Both  h*{')  and  S*  are  operators  that  incorporate  the  inverse  transformation  of  (35)  as  well  as  /i(-) 
and  5  respectively.  The  prior  distribution,  although  its  arguments  have  not  been  transformed  by 
(35)  is  retained  since  it  is  also  valid  over  (-00,00).  Newton’s  method  is  discussed  as  a  means  of 
maximizing  (36). 


Maximum  Entropy  The  Bayesian  formulation  in  (32)  may  be  reconsidered  from  the  view¬ 
point  of  entropy  [7].  ProbabUity  distributions  are  required  to  be  non-negative  and  sum  to  one,  a 
property  shared  by  the  full  additivity  constraint.  If  a  is  a  P  x  1  vector  of  fractional  abundances 
belonging  to  a  pixel,  the  corresponding  entropy,  if  (a),  is 

P 

if  (a)  =  -^ailnai.  (37) 

i—l 

The  entropy  derived  from  a  vector  of  abundances,  a,  for  a  mixed  pixel  can  be  shown  to  be, 
through  a  combinatoric  approximation,  proportional  to  the  number  of  ways  a  pixel  may  contain 
endmembers  in  the  proportions  denoted  by  a.  Combining  this  fact  with  another  approximation,  it 
can  be  shown  that  Pa(a)  is  proportional  to  if  (a). 

Hence,  for  a  particular  mixed  pixel,  the  objectives  are  to  find  the  abimdances  that  maximize 
(37)  and  to  satisfy  the  obvious  requirement  that  x  —  Sa  in  the  presence  of  additive  noise.  Taken 
separately,  these  goals  translate  to  expressions  for  prior  and  forward  densities,  respectively,  which 
may  be  multiphed  to  yield  the  posterior  density  in  (32).  The  log-posterior  density  is 

fiiPalx(a|x)  oc  J?(a)  -  y(x  -  Sa)^I]~^(x  -  Sa).  (38) 

K  is  a.  constant  that  is  included  to  account  for  the  approximation  linking  Pa(a)  to  ff  (a).  Thus,  the 
overall  objective  function  to  be  optimized  is  a  sum  of  an  entropy  calculation  and  a  Mahalanobis 
distance. 
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4.  ALGORITHM  TAXONOMY:  NON-LINEAR  UNMIXING 


We  have  not  presented  a  taxonomy  for  non-linear  unmixing  algorithnos  for  two  reasons.  First, 
the  high-level  categories  in  Figure  2  are  predicated  on  the  the  linear  mixing  model  in  (1)  and  the 
properties  induced  by  its  amenability  to  practical  mathematical  and  statistical  tools.  Approaches  to 
non-linear  immixiug  are  driven  by  the  physically  rigorous  models  in  Section  2.2,  which  do  not  easily 
accept  the  tools  apparent  in  linear  unmixing.  The  structure  in  Figure  2  is  primarily  predicated 
on  mathematical  and  statistical  considerations.  In  this  sense,  there  is  a  mutual  orthogonality  that 
inhibits  physics-based  analysis  from  co-existing  with  simple,  structured  mathematical  algorithms. 

Second,  and  most  importantly,  in  comparison  to  the  number  of  algorithms  aimed  at  linear 
unmixing,  very  httle  research  has  been  directed  toward  tackling  the  non-linear  problem  in  a  sys¬ 
tematic  way.  Some  of  the  efforts  are  stiff  exploring  the  sufficiency  of  the  linear  paradigm  and  have 
yielded  useful  experimental  results  [21,24].  Others  have  meticulously  developed  physical  mixing 
models  from  the  perspectives  outlined  in  Section  2.2;  in  these  cases,  their  unwieldiness  is  mitigated 
by  common-sense  physical  approximations  that  permit  simple  mathematical  techniques  to  estimate 
fractional  abimdances  [8,14,21].  For  example,  (2)  can  be  solved  for  a  using  the  method  of  least 
squares,  which  is  documented  as  an  inversion  technique  in  Figure  4.  As  non-linear  unmixing  tech¬ 
niques  mature,  a  taxonomy  comparable  to  the  ones  in  Figures  3-5,  but  built  on  physical  principles, 
will  naturally  emerge. 
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5.  PERFORMANCE  COMPARISONS 


Spectral  unmixing  has  emerged  as  a  key  application  arising  from  the  wealth  of  spectral  mea¬ 
surements  in  hyperspectral  processing.  Several  communities  have  shown  great  interest  in  the  de- 
compositional  analysis  of  mixed  pixels.  Unmixing  attempts  to  decompose  mixed  pixels  in  terms  of 
distinct,  unique  substances,  and  provide  a  foundation  for  doing  sub-pixel  material  identification. 

We  undertake  this  comparison  of  unmixing  algorithm  performance  with  the  knowledge  that 
many  algorithms  exist,  and  new  methods  are  constantly  being  explored  and  tested.  Several  disci- 
phnes  are  participating  in  the  attempt  to  perform  unmixing,  such  as  geology,  geophysics,  engineer¬ 
ing,  and  analytical  chemistry. 

5.1  DATA 

For  the  purpose  of  our  comparisons,  we  utilize  a  natural  scene  that  was  imaged  by  the 
HYDICE  sensor.  The  scene  in  Figure  7  was  collected  at  approximately  noon  on  September  22, 
1997  as  part  of  the  Alpine  Radiance  I  data  collection  in  Northern  California  and  consists  of  320 
lines  of  data,  each  having  306  samples  with  210  spectral  bands.  The  scene  is  a  gently  sloping  open 
area  at  an  elevation  of  8500  feet  that  is  bordered  by  both  mature  and  yoimg  lodgepole  pine  trees, 
as  well  as  areas  of  granite-derived  loams  that  are  covered  in  places  by  low-lying  sage  brush,  aspen 
trees,  and  cedar  trees.  There  were  no  clouds  during  the  data  collection  and  visibihty  was  deemed 
to  be  high. 

Of  the  210  spectral  bands,  63  bands  were  discarded  because  they  reside  in  the  electromagnetic 
intervals  where  water  vapor  vapor  absorption  is  dominant,  and  the  subsequent  signal-to-noise  ratio 
(SNR)  is  too  low.  Plots  of  spectra  will  have  gaps  to  represent  the  absence  of  data  in  those  intervals. 
The  radiance  measurements  for  the  remaining  147  bands  were  also  corrected  for  a  problem  with 
the  absolute  calibration  of  the  in-flight  calibration  bulb  that  affected  the  measurements  in  the 
1700  -  2500  nm  range.  To  rectify  this,  a  coefficient  vector  was  appHed  to  the  radiance  data  before 
atmospheric  compensation  was  performed  on  the  scene.  Atmospheric  compensation  was  performed 
on  the  scene  in  Figure  7  to  recover  surface  reflectance  values. 
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Figure  7.  Alpine  Radiance  Scene  I.  The  data  was  collected  around  noon  on  September  22,  1997. 
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5.2  SELECTED  ALGORITHMS 


In  this  comparison  of  algorithm  performance,  we  restrict  the  set  of  algorithms  that  we  con¬ 
sider  to  only  those  that  are  representative  of  a  fundamental  approach,  and  ignore  approaches  that 
are  variations  on  the  same  concept.  Moreover,  although  it  is  the  first  stage  of  unmbdng,  we  do  not 
compare  any  algorithms  that  perform  dimension  reduction,  but  instead  present  some  basic  statis¬ 
tical  information  about  the  scene  in  Figure  7  that  provides  insight  on  the  algorithm  performance 
for  endmember  determination  and  inversion. 


5.3  APPROACH 

Unfortunately,  the  ground  truth  that  exists  for  the  scene  in  Figure  7  is  insuflBcient  to  cor¬ 
roborate  the  results  from  unmixing  algorithms.  The  lack  of  accurate  sub-pixel  measurements  is  a 
significant  problem  in  tr5dng  to  assess  the  absolute  performance  of  unTnixiug  algorithms.  In  spite 
of  this,  we  attempt  to  docmnent  the  behavior  of  common  algorithms  and  make  judgments  about 
their  ability  to  achieve  goals  that  are  desirable  for  any  unmixing  algorithm. 


5.4  SCENE  STATISTICS 

To  provide  a  statistical  foundation  for  the  scene  in  Figure  7,  we  can  first  investigate  its 
statistical  properties.  The  scene  is  comprised  of  97920  pixels,  characterized  by  147  reflectance 
samples.  In  Figure  8,  the  covariance,  the  cumulative,  normalized  eigenvalues,  and  the  first  five 
eigenvectors  are  plotted. 

It  is  clear  from  Figure  8(b)  that  the  effective  dimensionality  of  the  spectra  in  the  scene 
is  relatively  low.  In  order  to  account  for  99%  of  the  power  in  the  scene,  only  three  principal 
components  are  necessary.  For  reference,  the  first  five  eigenvectors  for  the  scene  are  also  plotted  in 
Figure  8(c). 


5.5  ENDMEMBER  DETERMINATION 

We  will  compare  two  techniques  for  autonomously  determining  the  endmembers  in  a  scene. 
The  first  is  a  technique  based  on  geometric  interpretation  of  the  data  in  the  scene,  and  exploits  the 
parallelism  between  the  linear  mixing  model  (LMM)  and  the  theory  of  convex  sets  [3,5,9].  The 
second  is  based  on  the  idea  of  clustering  the  data  and  finding  abundances  and  endmembers  that 
minimize  an  overall  cost  function. 

5.5.1  Geometric  Methods 

In  our  experiment,  our  goal  was  to  determine  the  endmembers  from  the  vertices  of  a  simplex 
that  is  shrinkwrapped  aroimd  the  high-dimensional  volume  defined  by  the  data  in  the  scene.  Ge¬ 
ometric  methods  identify  endmembers  in  a  scene  from  the  vertices  of  a  minimum-voliune  simplex 
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that  encapsulates  the  data.  A  simplex  must  necessarily  have  one  more  vertex  (and  facet)  than  the 
dimension  of  the  data  it  encloses,  and  thus,  the  number  of  endmembers  is  one  greater  than  the 
effective  dimension  of  the  data. 

Using  the  results  in  Figure  8,  we  tested  the  geometric  method  under  the  different  assumptions 
that  the  effective  dimension  of  the  data  was  3,  4,  5,  and  6.  Based  on  the  results  in  Section  5.4, 
we  initially  presumed  that  the  approximate  dimension  of  the  data  was  three,  and  we  reduced  the 
original  147-dimensional  data  to  three  dimensions  using  a  linear  transform  constructed  from  the 
first  three  principal  components  of  the  covariance.  Numerous  variations  exist  on  the  algorithm, 
especially  in  the  dimension  reduction  performed  on  the  data,  we  felt  it  was  reasonable  to  use  PCA, 
since  it  is  optimal  when  optimally  preserving  the  energy  in  the  signals  [35]. 

After  reducing  the  dimension  to  three,  we  supplied  the  data  to  a  convex  hull  algorithm.  We 
used  the  QHULL  algorithm  (http://www.geom.umn.edu/software/qhull/)  to  implement  this  pro¬ 
cedure  and  then  submitted  the  convex  hull  of  the  3-dimensional  points  in  the  scene  to  a  shrinkwrap¬ 
ping  algorithm  based  on  the  formulation  of  the  Fixed-Point  Free  Transform(FPFT)  [9].  The  four 
endmember  reflectance  spectra  produced  by  this  technique  are  depicted  in  Figure  9. 

The  increasing  number  of  endmembers  in  each  plot  from  Figure  9  highlights  a  new,  distinct 
spectra  from  the  scene  in  Figure  7.  As  the  number  increases,  new  classes  appear  that  were  originally 
lumped  in  with  other  classes  for  lower-dimensional  trials  of  the  algorithm. 

5.5.2  Clustering 

In  contrast  to  the  geometric  technique  in  Section  3.3. 1.1  for  endmember  determination,  clus¬ 
tering  techniques  have  employed  iterative,  statistical  methods  to  arrive  at  estimates  of  endmembers 
as  well  as  abundances.  Despite  the  variations,  the  large  majority  of  approaches  arrive  at  estimates 
using  second-order  statistics  to  define  statistical  classes  for  each  endmember  and  by  relating  the 
fractional  abundances  of  each  pixel  to  partial  membership  in  each  class.  Then,  upon  iteration, 
estimates  of  abundances  and  endmembers  minimize  a  cost  function.  When  the  cost  function  arises 
from  a  probabilistic,  Gaussian  formulation,  solutions  often  provide  Maximum  Likelihood  (ML) 
solutions  [17,30,33]. 

We  consider  for  our  experiments  a  non-probabilistic,  statistical  technique  that  provides  the 
foimdation  for  many  of  the  clustering-based  approaches  to  unmixing.  The  idea  of  fuzzy  K-means 
clustering  [10]  attempts  to  optimize  a  least-squares-based  cost  function  by  finding  the  best  end- 
members  and  abundances.  For  a  collection  of  N  pixels,  the  objective  function  may  be  expressed 
as 


P  N 

J^(A,  S)  =  ^  (39) 

i=l  j=i 

where  Aij  is  an  estimate  of  the  z-th  abundance  for  the  j-th  pixel  and  dij^  the  weighted  squared-error 
between  the  j-th  pixel  and  the  z-th  centroid,  or  endmember,  Si,  is 

=  (40) 
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Figure  9.  Endmembers  for  geometric  technique  using  (a)  3-cornered  simplex;  (b)  4-cornered  simplex;  (c) 
5-cornered  simplex;  (d)  6-comered  simplex. 


^  ^ 


In  our  experiments,  we  let  W  =  /,  and,  upon  the  recommendation  of  previous  investigators,  let 
r  =  2.5.  The  endmembers  that  result  for  the  initial  assertion  that  there  are  three  classes  appear  in 
Figure  10. 

5.6  INVERSION 

Endmember  determination  techniques  identify  the  unique  spectra  that  define  the  components 
of  the  LMM.  The  complementary  task  of  unmixing  is  to  identify  the  proportions  in  which  the 
endmembers  appear  for  each  pixel  in  the  scene  being  analyzed.  Nmnerous  algorithms  attempt  to 
use  pre-determined  endmembers  to  estimate  abimdances,  but  the  vast  majority  of  techniques  are 
based  on  the  method  of  least  squares,  which  attempts  to  minimize  ||x  —  Sa|p. 

A  vital  part  of  inversion  is  including  two  important,  physical  constraints.  The  full  additivity 
constraint  requires  that  the  non-negativity  constraint  requires  that  Oj  >  0,i  = 

1, . . .  ,P.  Several  efforts  have  been  made  to  incorporate  these  constraints  into  the  unconstrained 
least  squares  solution,  eP  =  (S^S)“^S^x  [31].  In  addition  to  the  unconstrained  least  squares 
solution,  we  will  consider  two  more  inversion  methods  in  our  performance  comparisons.  The  fuU 
additivity  least  squares  solution  [16]  is  given  by 

a^  =  -  (S^S)-^Z^[Z(S^S)-iZ^]-^(ZaU  -  b).  (41) 

a^  is  the  least  squares  estimate  of  the  abimdances  when  the  fuU  additivity  constraint  is  enforced, 
Z  is  an  12  X  P  matrix  and  b  is  an  i2  x  1  vector  that  together  convey  the  R  linearly  independent 
equations,  Za  =  b,  that  serve  as  linear  constraints  on  the  solution  of  (1).  To  enforce  full  additivity, 
Z  is  a  1  X  P  row  vector  whose  entries  are  all  ones,  and  b  =  1.  Examining  (41)  reveals  that  the 
solution  enforcing  full  additivity  consists  of  the  unconstrained  LSE  solution,  a^,  with  an  additive 
correction  term  that  depends  on  the  matrix  of  endmembers,  S,  and  the  error  incurred  by  the 
unconstrained  solution,  a^,  in  satisfying  the  full  additivity  constraint. 

Figure  11  conveys  the  results  of  unconstrained  least  squares  unmixing  when  the  six  endmem¬ 
bers  derived  from  the  geometric  endmember  algorithm  are  used  (one  is  discarded  for  being  mostly 
negative).  Also  included  is  a  figxue  illustrating  the  sum  of  the  estimated  abimdances.  Clearly,  the 
unconstrained  least  squares  version  does  not  sum  to  one  in  the  case  of  a  majority  of  pixels  in  the 
scene. 
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6.  SUMMARY 


We  have  introduced  three  algorithm  taxonomies  in  Figures  3,  4,  and  5  for  hyperspectral 
unmixing  whose  organization  begins  with  the  features  in  Figure  2.  The  majority  of  algorithms  are 
designed  aroxmd  a  combination  of  three  common  assumptions:  linear  mixing,  Gaussianity,  and,  the 
minimization  of  squared-error.  A  key  discriminating  feature  of  algorithms  is  their  use  of  statistics, 
which  has  significant  imphcations,  especially  in  detection,  when  sub-pixel  targets  are  comparatively 
rare  and  their  discerning  features  may  be  lost  in  aggregate  measures.  Likewise,  the  use  of  parametric 
models  to  describe  the  random  behavior  of  pixels  is  focused  primarily  on  Gaussian  models,  but  some 
efforts  have  been  attempted  to  exploit  higher-order  statistics.  While  non-hnear  unmixing  has  been 
investigated  by  a  number  of  researchers,  it  is  still  premature  to  develop  a  coherent  taxonomy  of 
algorithms  in  this  field. 

It  is  clear  that  methods  for  unmixing  that  incorporate  a  strong  physical  intuition  have  a  better 
chance  of  successfully  immixing  pixels  in  a  hyperspectral  image.  We  saw  in  our  experiments  that 
the  geometric  endmember  determination  algorithm  was  able  to  extract  spectra  that  bore  a  strong 
resemblance,  at  least  in  shape,  to  distinct  spectra  in  the  scene.  In  contrast,  the  fuzzy  clustering 
technique  did  not  exhibit  the  ability  to  yield  endmembers  with  a  strong  physical  correlation  to 
spectra  in  the  scene.  As  a  consequence,  its  abundance  planes  did  not  exhibit  a  strong  ability  to 
highlight  distinct  substances.  More  evaluation  must  be  performed  with  other  algorithms  and  with 
data  that  has  been  ground-truthed. 

Unmixing  is  an  important  class  of  algorithm  for  hyperspectral  processing.  It  is  an  important 
contributor  to  vital  mihtary  apphcations  such  as  terrain  characterization  and  traBBcability  that  lead 
to  an  intelligent  preparation  of  a  battlespace.  The  combination  of  physical  modeling  and  mathe¬ 
matical  techniques  is  capable  of  yielding  information  from  scenes  that  was  previously  unattainable 
by  other  sensors.  Without  accurate  ground  truth,  however,  complete  unmixing  of  scenes  is  difficult 
to  corroborate.  Nevertheless,  unmixing  holds  the  potential  of  revealing  more  firom  hyperspectral 
scenes  than  any  other  application. 
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GLOSSARY 


BDRF 

Bi-direction  reflectance  function 

BLUE 

Best  linear  unbiased  estimator 

DPFT 

Dark-point-fixed  transform 

DUSD 

Deputy  Under  Secretary  of  Defense 

EM 

Expectation  maximization 

FPFT 

Fixed-point-free  transform 

FVA 

Filter  vector  algorithm 

ICA 

Independent  component  analysis 

LMM 

Linear  noixing  model 

LSE 

Least  squares  error 

MAP 

Maximum  a  posteriori 

ML 

Maximum  likelihood 

MNF 

Maximiun  noise  fraction 

MSE 

Mean  squared  error 

MVT 

Minimum  volume  transform 

MVUE 

Minimum  variance  unbiased  estimator 

NAPC 

Noise  adjusted  principle  components 

ORASIS 

Optical  real-time  adaptive  spectral  identification  system 

PCA 

Principal  component  analysis 

QICAR 

Quasi-intrinsic  Gaussian  conditional  autoregression 

SMM 

Stochastic  mixing  model 

SNR 

Signal-to-noise  ratio 

SSM 

Salient  selection  module 

SVD 

Singular  value  decomposition 

TBLS 

Total  block  least  squares 
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